120 research outputs found
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise
The existence of spurious correlations such as image backgrounds in the
training environment can make empirical risk minimization (ERM) perform badly
in the test environment. To address this problem, Kirichenko et al. (2022)
empirically found that the core features that are related to the outcome can
still be learned well even with the presence of spurious correlations. This
opens a promising strategy to first train a feature learner rather than a
classifier, and then perform linear probing (last layer retraining) in the test
environment. However, a theoretical understanding of when and why this approach
works is lacking. In this paper, we find that core features are only learned
well when their associated non-realizable noise is smaller than that of
spurious features, which is not necessarily true in practice. We provide both
theories and experiments to support this finding and to illustrate the
importance of non-realizable noise. Moreover, we propose an algorithm called
Freeze then Train (FTT), that first freezes certain salient features and then
trains the rest of the features using ERM. We theoretically show that FTT
preserves features that are more beneficial to test time probing. Across two
commonly used spurious correlation datasets, FTT outperforms ERM, IRM, JTT and
CVaR-DRO, with substantial improvement in accuracy (by 4.5%) when the feature
noise is large. FTT also performs better on general distribution shift
benchmarks
A study of conceptual language similarity: comparison and evaluation
An interesting line of research in natural language processing (NLP) aims to
incorporate linguistic typology to bridge linguistic diversity and assist the
research of low-resource languages. While most works construct linguistic
similarity measures based on lexical or typological features, such as word
order and verbal inflection, recent work has introduced a novel approach to
defining language similarity based on how they represent basic concepts, which
is complementary to existing similarity measures. In this work, we study the
conceptual similarity in detail and evaluate it extensively on a binary
classification task
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
In comparative linguistics, colexification refers to the phenomenon of a
lexical form conveying two or more distinct meanings. Existing work on
colexification patterns relies on annotated word lists, limiting scalability
and usefulness in NLP. In contrast, we identify colexification patterns of more
than 2,000 concepts across 1,335 languages directly from an unannotated
parallel corpus. We then propose simple and effective methods to build
multilingual graphs from the colexification patterns: ColexNet and ColexNet+.
ColexNet's nodes are concepts and its edges are colexifications. In ColexNet+,
concept nodes are additionally linked through intermediate nodes, each
representing an ngram in one of 1,334 languages. We use ColexNet+ to train
\overrightarrow{\mbox{ColexNet+}}, high-quality multilingual embeddings that
are well-suited for transfer learning. In our experiments, we first show that
ColexNet achieves high recall on CLICS, a dataset of crosslingual
colexifications. We then evaluate \overrightarrow{\mbox{ColexNet+}} on
roundtrip translation, sentence retrieval and sentence classification and show
that our embeddings surpass several transfer learning baselines. This
demonstrates the benefits of using colexification as a source of information in
multilingual NLP.Comment: EMNLP 2023 Finding
Effects of Meteorology Changes on Inter-Annual Variations of Aerosol Optical Depth and Surface PM2.5 in China—Implications for PM2.5 Remote Sensing
PM2.5 retrieval from satellite-observed aerosol optical depth (AOD) is still challenging due to the strong impact of meteorology. We investigate influences of meteorology changes on the inter-annual variations of AOD and surface PM2.5 in China between 2006 and 2017 using a nested 3D chemical transport model, GEOS-Chem, by fixing emissions at the 2006 level. We then identify major meteorological elements controlling the inter-annual variations of AOD and surface PM2.5 using multiple linear regression. We find larger influences of meteorology changes on trends of AOD than that of surface PM2.5. On the seasonal scale, meteorology changes are beneficial to AOD and surface PM2.5 reduction in spring (1–50%) but show an adverse effect on aerosol reduction in summer. In addition, major meteorological elements influencing variations of AOD and PM2.5 are similar between spring and fall. In winter, meteorology changes are favorable to AOD reduction (−0.007 yr−1, −1.2% yr−1; p < 0.05) but enhanced surface PM2.5 between 2006 and 2017. The difference in winter is mainly attributed to the stable boundary layer that isolates surface PM2.5 from aloft. The significant decrease in AOD over the years is related to the increase in meridional wind speed at 850 hPa in NCP (p < 0.05). The increase of surface PM2.5 in NCP in winter is possibly related to the increased temperature inversion and more stable stratification in the boundary layer. This suggests that previous estimates of wintertime surface PM2.5 using satellite measurements of AOD corrected by meteorological elements should be used with caution. Our findings provide potential meteorological elements that might improve the retrieval of surface PM2.5 from satellite-observed AOD on the seasonal scale
Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective
Recent studies have discovered that Chain-of-Thought prompting (CoT) can
dramatically improve the performance of Large Language Models (LLMs),
particularly when dealing with complex tasks involving mathematics or
reasoning. Despite the enormous empirical success, the underlying mechanisms
behind CoT and how it unlocks the potential of LLMs remain elusive. In this
paper, we take a first step towards theoretically answering these questions.
Specifically, we examine the capacity of LLMs with CoT in solving fundamental
mathematical and decision-making problems. We start by giving an impossibility
result showing that any bounded-depth Transformer cannot directly output
correct answers for basic arithmetic/equation tasks unless the model size grows
super-polynomially with respect to the input length. In contrast, we then prove
by construction that autoregressive Transformers of a constant size suffice to
solve both tasks by generating CoT derivations using a commonly-used math
language format. Moreover, we show LLMs with CoT are capable of solving a
general class of decision-making problems known as Dynamic Programming, thus
justifying its power in tackling complex real-world tasks. Finally, extensive
experiments on four tasks show that, while Transformers always fail to predict
the answers directly, they can consistently learn to generate correct solutions
step-by-step given sufficient CoT demonstrations.Comment: 33 page
A Crosslingual Investigation of Conceptualization in 1335 Languages
Languages differ in how they divide up the world into concepts and words;
e.g., in contrast to English, Swahili has a single concept for `belly' and
`womb'. We investigate these differences in conceptualization across 1,335
languages by aligning concepts in a parallel corpus. To this end, we propose
Conceptualizer, a method that creates a bipartite directed alignment graph
between source language concepts and sets of target language strings. In a
detailed linguistic analysis across all languages for one concept (`bird') and
an evaluation on gold standard data for 32 Swadesh concepts, we show that
Conceptualizer has good alignment accuracy. We demonstrate the potential of
research on conceptualization in NLP with two experiments. (1) We define
crosslingual stability of a concept as the degree to which it has 1-1
correspondences across languages, and show that concreteness predicts
stability. (2) We represent each language by its conceptualization pattern for
83 concepts, and define a similarity measure on these representations. The
resulting measure for the conceptual similarity of two languages is
complementary to standard genealogical, typological, and surface similarity
measures. For four out of six language families, we can assign languages to
their correct family based on conceptual similarity with accuracy between 54%
and 87%.Comment: ACL 202
- …